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1. INTRODUCTION 

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus that caused the Corona 
Virus Disease 2019 (COVID-19) illness. According to WHO, the symptoms for COVID-19 are varied, 
ranging from mild (such as fever, cough, loss of taste or smell) to serious (such as difficulty breathing, loss of 
speech, and chest pain) [1]. The first case of COVID-19 was confirmed in Wuhan, China in November 2019, 
and started to spread outside China in January 2020 [2]. Since then, the virus has spread rapidly throughout 
the world. On June 7th, 2022, there was a total of 536,021,397 confirmed cases and 6,321,719 deaths in the 
world as a result of COVID-19 [3]. 

Indonesia was ranked 19th in the world for the highest number of confirmed COVID-19 cases [3], 
and 6th in Asia. On December 12th, 2022, the number of infections had reached 6,704,268 cases, with a total 
of 160,311 deaths [4]. The first case of COVID-19 in Indonesia was confirmed on March 2, 2020 [5]. In 
response, the government tried to prevent the spread of the virus by implementing various levels of lockdown 
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policies [6] and mandating vaccination. It is critical to prevent the spread of COVID-19. For this purpose, 
scientists began developing COVID-19 vaccines [7], which are now used in many countries around world. 
President Joko Widodo was the first to receive COVID-19 vaccination in Indonesia on January 13, 2021 [8]. 
To promote vaccination coverage, the Indonesian Ministry of Health lists thirteen different types of legal 
vaccines [9], while the five most commonly used vaccines in Indonesia are: Sinovac, AstraZeneca, Moderna, 
Sinopharm, and Pfizer. Today, Indonesia's COVID-19 vaccine coverage has reached 401,308,016 doses [10]. 

Twitter is one of the most popular social media platforms in Indonesia. Indonesia is ranked fifth in 
the world in terms of Twitter users (18,45 million) [11]. This massive amount of tweet data can be used to 
extract useful information, such as the public sentiment in Indonesia toward COVID-19 vaccines. We 
contend that this information is beneficial for the Indonesian government in making sound decisions. For 
example, because the Indonesian government is currently encouraging the public to get “booster vaccines,” 
the findings of this study may be useful for health policymakers in determining which vaccines are 
appropriate for use.In this study, we conduct sentiment analysis on public perception toward the five most 
commonly used COVID-19 vaccines in Indonesia, i.e., Sinovac, AstraZeneca, Moderna, Sinopharm, and 
Pfizer. The results of the sentiment scores are then used for temporal, geographical, and correlation analysis. 
Temporal and geographical analyses are performed to identify the variation in sentiment for each type of 
vaccine across time and geography, respectively. Then, correlation analysis was conducted to identify the 
relationship between some potentially related variables. 

This study was aimed to analyze the public sentiment toward COVID-19 vaccination especially on 
five types popular vaccine types in Indonesia. We raise four research questions in this study: 1) How does 
public perception toward five types of vaccines in Indonesia?; ii) Does public sentiment of each vaccine 
Indonesia changes across time?; iii) Does public sentiment in Indonesia differ based on geographic location 
in Indonesia?; and iv) Is there any significant correlation between related variables (total tweet, sentiment 
score, new case, new death)?. To answer these research questions, we collected the twitter dataset from 
January 2020 to December 2021 with total 280,826 datasets categorized based on vaccine type. 


2. RELATED WORKS 

A lot of previous studies on sentiment analysis towards COVID-19 focuses on improving the 
accuracy of sentiment classification model [12]—[15]. In contrast to them, this study did not develop a 
classification model, but we analyze the public sentiments on five types of COVID-19 vaccines using an 
available sentiment analysis tool and look deeper into temporal and geographical perspectives. 

Analyzing public sentiments using existing sentiment analysis tools or lexicons have also been 
explored in previous work. Yousef et al. [16] examined the effect of public health campaigns and COVID-19 
related events on sentiment and vaccine uptake, while the sentiment was identified using AlI-based tool, 
BytesView. Shim et al. [17] analyzed changes in public perception of COVID-19 vaccines in Korea using 
Korean sentiment lexicon. Wang et al. [18] and Melton et al. [19] examined examined public sentiments and 
opinions regarding the COVID-19 vaccine using TextBlob. In our study, Valence Aware Dictionary and 
Sentiment Reasoner (VADER), is used to analyze Indonesian public sentiments towards COVID-19 
vaccines. 

A previous study that is mostly related to our study is that of Liu et al. [20] analyzed public attitudes 
toward COVID-19 vaccines over a three-month period using English-language tweets and the sentiment 
analysis tool VADER [21]. Their analyses focused on the COVID-19 vaccine in general and did not examine 
changes in sentiment toward specific types of vaccines. Our study differs from them in that we analyze public 
perception on specific types of COVID-19 vaccines, rather than COVID-19 vaccines in general. Furthermore, 
while Liu et al. analyzed the sentiments in English-speaking countries using English-language tweets over a 
three-month period, we analyzed sentiments in Indonesia using tweets written in Bahasa Indonesia over a 
two-year period. Their geographical analysis research was conducted at the country level, whereas our 
research was conducted at the province level when analyzing the variation in sentiment. 

Sentiment analysis of COVID-19 vaccines has received a lot of attention in Indonesia [22]—[32] All 
the analyses, however, are focused on the development of more accurate classification models using machine 
learning and deep learning models. A few previous studies have been conducted on Indonesian attitudes 
toward specific types of vaccines [23], [25], [26]. However, these studies only used a few types of vaccines 
and a short-time interval, and did not analyze temporal, geographical, or correlational data. In this study, we 
use a broad range of vaccines and long-time interval data. While Saadah et al. [24] who worked on 
classifying the sentiment for Indonesia vaccine tweets actually conducted the geographic analysis based on 
province in Indonesia, however, they focused on observing the opinion polarity for free and paid vaccination 
program. But our focus in this work is to analyze public sentiments towards COVID-19 vaccines in general. 
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3. RESEARCH METHOD 
3.1. Data collection 

We used Twitter API for Academic Research and the Python library Tweepy to collect Indonesian- 
language tweets about COVID-19 vaccines from January 1, 2020 to December 31, 2021. We only retrieved 
non-retweeted tweets and tweets with the language code “in” which means that their text was identified as 
Bahasa Indonesia by Twitter. To crawl tweets about Sinovac, Moderna, Sinopharm, and Pfizer, we used the 
corresponding queries: “vaksin sinovac”, “vaksin moderna”, “vaksin sinopharm”, and “vaksin pfizer’, 
respectively. Because AstraZeneca is frequently spelled as AstraZeneca, Astra Zeneca, or AZ, our query to 
collect tweets about this vaccine is ((vaksin astrazeneca) OR (vaksin astra zeneca) OR (vaksin az)). Note that 
“vaksin” is the Indonesian word for vaccine, and we included that term at the beginning of our queries to 
filter out tweets that are possibly not belong to Bahasa Indonesia or are not related to the vaccine. In total, 
there were 378,697 tweets retrieved by our queries. We filter out tweets that discuss more than one type of 
vaccine. This process resulted in a total of 280,826 tweets. 


3.2. Sentiment analysis 

A sentiment analysis tool VADER is used to assign a sentiment score for each tweet [21]. VADER 
is chosen because it has been demonstrated to perform well for sentiment analysis on social media datasets 
such as Twitter [20]. In previous studies, VADER was shown to outperform human annotators in predicting 
the sentiments of tweets [21]. In general, VADER generates a sentiment score (compound score) for a given 
text that is further used to classify the tweet as positive (compound score >=0.05), negative (compound score 
<=0.05), or neutral (compound score -0.05 <compound <0.05). In order to use VADER, the tweets in our 
dataset were first translated into English using the Python library googletrans, which uses the Google 
translate API to perform translation. 


3.3. Temporal analysis 

Temporal analysis is performed using Pruned Exact Linear Time (PELT) [33] algorithm to 
determine the trend of sentiment scores. Here, PELT is applied to detect change points in the sentiments 
scores across times. PELT applied the procedure that minimized the cost function to find change points. 
PELT also could detect more change points compared to other change points method [33]. It has been 
demonstrated that PELT is a precise research method with a high sensitivity to detect change points [34] and 
it is efficient and more accurate [33]. Before PELT is applied, following Liu et al. [20], 14-day moving 
average is applied to smooth out the sentiment scores fluctuations and obtain the trends of these scores. 


3.4. Geographical analysis 

In order to examine the variation in sentiment across Indonesian geography, we identify tweets that 
contain user’s province information using a simple lookup against the list of Indonesian provinces and cities 
that we built. The resulting tweets are then grouped according to their province. Since in 2021, Indonesia still 
has 34 provinces, then there are 34 province categories. The sentiment scores of tweets in each province were 
calculated. Furthermore, we also applied One-Way ANOVA using the SPSS tool to examine if vaccine type 
had a significant effect on sentiment scores. 


3.5. Correlation analysis 

Pearson Correlation is used to analyze the correlation between some potentially related variables, 
such as number of confirmed cases, number of deaths, number of tweets, and sentiment scores. These 
variables were calculated for each month, and then the correlation was calculated based on these values. The 
information about the number of new cases of COVID-19 as well as new deaths because of COVID-19 in 
Indonesia within 2020-2021 were obtained from the official WHO dashboard for COVID-19. 


4. RESULT 
4.1. Total tweets for each vaccine type 

A total of 280,826 tweets were collected containing five different types of vaccines that were posted 
between January 1, 2020 and December 31, 2021. Table 1 summarizes the total tweets for each vaccine. In 
general, the number of tweets discussing about COVID-19 vaccines in 2021 is higher than that in 2020 
because the vaccination rollout in Indonesia starts in January 2021. The type of vaccine that is mostly 
discussed by Indonesian public is found to be Sinovac, then is followed by AstraZeneca. These two vaccines 
are in fact the earliest vaccines used in Indonesia. Sinovac was started to use in Indonesia on January 13, 
2021 [10] and AstraZeneca was around March-April 2021. 
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Table 1. Statistics of total tweet for each vaccine 
Total tweets 
No Vaccine type 2020 2021 Overall 


1 AstraZeneca 2,485 71,924 74,409 

2 Moderna 1,863 24,493 26,356 

3 Pfizer 6,407 33,672 40,079 

4 Sinopharm 599 7,484 8,083 

5 Sinovac 16,667 115,232 131,899 
Total 28,021 252,805 280,826 


In addition, among 5 types of COVID-19 vaccines studied in this work, Indonesia receives the 
highest doses for these two vaccines. Sinopharm has the lowest number of tweet because among 5 vaccines, 
it is the least used vaccine in Indonesia. 


4.2. The results of sentiment analysis 

Table 2 shows the average sentiment score for each type of vaccine. We found out that there are 
39% positive tweets (108.817 tweets), 18% negative tweets (51.384 tweets), and 43% neutral tweets 
(120.613 tweets). We found out that the average sentiment score calculated by VADER for all vaccine types 
is >0.05, indicating that the sentiments for all vaccines are positive. Sinopharm and Pfizer had the highest 
average sentiment scores, while AstraZeneca had the lowest. This indicates that Sinopharm and Pfizer were 
the two most preferred vaccines in Indonesia, while AstraZeneca was the least. 


Table 2. Average sentiment score for each vaccine type 
Average sentiment score 


Vaccine type 2020 2021 Overall 
AstraZeneca 0.200 0.151 0.152 
Moderna 0.407 0.204 0.220 
Pfizer 0.313 0.216 0.233 
Sinopharm 0.348 0.251 0.258 
Sinovac 0.224 0.204 0.207 
Overall 0.298 0.205 0.214 
Average 


The total tweets for each sentiment category for each vaccine is presented in Figure 1. We can see 
that in general, for each vaccine, the number of neutral tweets is the highest, followed by the number of 
positive tweets, then the negative tweets. It indicates that all COVID-19 vaccines receive mostly neutral and 
positive sentiments from Indonesian public. 
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Figure 1. Total tweets by sentiment classification 
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We calculated a One-Way ANOVA on sentiment scores across all vaccine types to examine the 
statistically significant differences of sentiment scores between vaccine types. The results show that the type 
of vaccine had a significant effect on sentiment scores (F(4,280826)=91.690, p<0.001). A Tukey post-hoc 
test was then performed to determine which pairs of vaccine types had significantly different sentiment 
scores. The result shows that all vaccine types appear to have significant differences in sentiment scores 
toward AstraZeneca, which indicates that Sinopharm, Pfizer, Sinovac, and Moderna are significantly more 
preferred than AstraZeneca. Then, Sinopharm is also shown to have significant differences in sentiment 
scores with all other vaccines, indicating it is valued/perceived more positive than all other vaccines by 
Indonesian public. 


4.3. The results of temporal analysis 

In this section, we will explain change points for each type of vaccine and explain the reasons 
behind it. The change points can be seen in Figure 2 (see the red lines). The blue lines illustrates the 14-day 
moving average of sentiments scores for each vaccine, while the vertical red lines illustrate the change points 
detected. The change points indicated that some sentiment scores were significantly increasing or decreasing. 
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Figure 2. Change points of sentiment scores across time 
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For the Sinovac vaccine, the first change point occurred on July 27, 2020, when we identified an 
increase in sentiment score. This shift occurred as a result of tweets claiming that Sinovac had no serious side 
effects and was proven to be safe. The next change point occurred on August 31, 2020, indicating a decrease 
in sentiment score, with most of the discussion centered on out of stock Sinovac. 

For AstraZeneca, the first change point was observed on September 12,2020, when the sentiment 
score decreased drastically. This was due to the issues around AstraZeneca vaccination such as side effect of 
AstraZeneca, disbelief to AstraZeneca efficacy and the misleading information that AstraZeneca caused the 
blind. These factors collectively led to a sharp decline in public trust and confidence in the AstraZeneca 
vaccine, triggering a notable shift in public sentiment towards the vaccine. 

For the Pfizer vaccine, the first change point was observed on December 10, 2020. There were 
several tweets about the Pfizer vaccine only providing an average amount of antibodies, implying that some 
Pfizer vaccine recipients have lower antibody levels and less protection. This discussion caused the sentiment 
score to fall. The sentiment score stay in negative till the beginning of 2021 since there were many issues 
about serious side effects of the Pfizer vaccine, ranging from allergic reactions to death. The first change 
point for Moderna occurred on May 20, 2020, when we observed that the sentiment score was gradually 
increasing. The majority of the tweets discussed the third phase of the Moderna clinical test [35] and 
Moderna was shown to produce antibodies. However, on 2021 several change points shown to be decreasing 
because Moderna was targeted to the health workers and was inaccessible for public that caused public 
angers. For Sinopharm, the first change point occurred on November 13, 2020 when we noticed a decreased 
in its sentiment score, with most of the discussion centered on collaboration Indonesia and Republic Of 
China on vaccine production, where some people had the negative thoughts of China. The sentiment score 
increased as a result of these positive tweets. The score rose in on November 27, 2021 in response to tweets 
claiming that Sinopharm had obtained an emergency use permit for 2021 [36]. 


4.4. The results of geographical analysis 

Figure 3 depicts the distribution of Overall, there are 91,379 tweets with province location 
information out of a total of 280,814 tweets. To obtain the tweets that contained province location, first we 
identify tweet that contained terms of cities or provinces in Indonesia. We mapped city level to province 
level. We mapped the location tags to 34 provinces in Indonesia. Finally, we obtained 34 tweet groups 
according to the province. Jakarta has the highest number of tweets, followed by West Java, East Java, 
Central Java, and Yogyakarta. While the lowest number of tweets gained by West Papua province. 

Sentiment scores across all provinces in Indonesia. The stronger the color indicates the higher the 
sentiment scores. It implies that the more positive the attitudes of public towards COVID-19. The sentiments 
in all provinces (except West Papua) are classified as positive, as they are larger than 0.05. This indicates that 
Indonesian public in any provinces tend to react positively to the COVID-19 vaccines. The sentiment scores 
of West Papua may be inaccurate since this province only has 6 tweets. 
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Figure 3. Sentiment score distribution in every province 
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It appears from the figure that the province with the strongest blue color is Maluku with the 
sentiment score of 0.498. It is followed by West Sumatera (Sumatera Barat), West Sulawesi (Sulawesi 
Barat), and Middle Kalimantan (Kalimantan Tengah), and Gorontalo provinces, with the sentiment scores of 
0.4731, 0.4731, and 0.4558, 0.4249, respectively. The result of one-way ANOVA test shows that there is 
significant differences of sentiment scores between provinces (F(33, 91379)=91.690, p<0.001). It indicates 
that there is significant effect of province on the sentiment scores. 


4.5. The results of correlation analysis 

The correlation analysis examines the correlation between the total number of tweets, the total 
number of positive, negative, neutral tweets, new deaths, and new cases, followed by the relationship 
between sentiment score, new deaths, and new cases. We discovered a strong positive correlation between i) 
total tweets and new cases; ii) total tweets and new deaths; iii) total positive tweets and new cases; iv) total 
positive tweets and new deaths; v) total negative tweets and new cases; vi) total negative tweets and new 
deaths; vii) total neutral tweets and new cases; viii) total neutral tweets and new deaths. This demonstrates 
that the increase in total tweets was linearly proportional to the increase in new confirmed cases and 
confirmed deaths. Significant correlation between sentiment scores and new deaths as well as new cases, 
however, were not found since the p-value is greater than 0.05. 


5. DISCUSSION 

This work used twitter data to observe about people’s opinion and sentiment regarding COVID-19 
vaccine in Indonesia. By applying sentiment analysis tools; VADER, to classify the tweets into three classes, 
positive, negative and neutral. The result we obtained showed that our dataset dominated by neutral tweets by 
43% (120,613 tweets) following by positive tweets 39% (108,817 tweets) and negative tweets 18% (51,384 
tweets) and the amount of tweet started to increase drastically at 2021 where the people discussed about the 
vaccination more than the previous year. This is reasonable because the vaccination rollout in Indonesia starts 
in January 2021 [8]. 

Our results showed that the majority of Indonesian tend to react neutrally upon a vaccine policy 
even though the vaccination program surrounded by the misleading information [37]. Furthermore, the 
vaccination coverage in Indonesia reach to 60% from the total population [4]. Our result is different from 
Wang et al. [18] and Melton et al. [19] who reported that in general, public in United States of America tend 
to react positively toward COVID-19 vaccine and the positive sentiment increased as more people got 
vaccinated. Our results are also different from Yousef et al. [16] who demonstrated that negative sentiment 
was dominated in their dataset on Australia vaccination program on Twitter. Nevertheless, our result was 
similar with Choi et al. [38] who demonstrated that public sentiments towards COVID-19 vaccines in South 
Korea were dominated by neutral sentiment. 

We applied PELT, to observe the change points of sentiments within 2020-2021. Our findings 
identified 10 change points in average for each type of vaccine. This is almost similar with the finding of 
Liu et al. [20] who also applied PELT in their study and identified 8 change points. Although Liu et al. [20] 
dataset was ranged in four months (November 2020—February 2021) while our dataset ranged for 24 months 
(January 2020—December 2021), but we found comparably similar number of change points. In our research, 
several change points indicated the vaccine efficacy that results in the extremely increasing sentiment score. 
Our result similar with Liu et al. [20] who detected a change point when Pfizer was announced to achieve 
90% effective rate. 

Furthermore, we conducted geographic analysis to get the sentiment polarity based on provinces in 
Indonesia. Overall, we mapped our dataset to 34 provinces in Indonesia. We found out that Jakarta placed the 
top as the highest province in total tweet, as the capital city of Indonesia, Jakarta was being the central of 
information where the twitter user is actively discussed about the latest vaccination information, the 
sentiment also dominated by positive sentiment. In this case, our result similar with Liu et al. [20] where they 
stated that Washington DC as the capital city of the United States dominated by positive sentiment. This 
shows that both capital city in the United States and Indonesia obtained the positive sentiment score as the 
majority. 

Finally, we applied Pearson correlation analysis to conduct the correlation between seven variables: 
total tweet, total positive tweets, total negative tweets, total neutral tweets, sentiment score, new case, and 
new deaths. A study conducted by Shim et al. [17], had similar variables to conduct correlation analysis 
using social media data in Korea, but our result differed from them. We discovered a strong correlation 
between total tweets and new cases (r=0.9, p=0.001), and total tweets and new deaths (r=0.8, p=0.008), but 
did not find significant correlation between sentiment scores and new cases or new deaths. However, the 
findings differed from those of Shim et al. [17], who found some correlation between sentiment score and 
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newly confirmed cases, but no significant correlation between the number of tweets and the number of 
confirmed COVID-19 cases. Then, we also discovered the strong correlataion between total positive, 
negative, neutral tweets to new case and new deaths. Which show that for every increasing of total positive, 
negative, and neutral tweet grow linearly with the daily total confirmed cases and new deaths. 

The study faced some notable constraints, including the fact that the majority of tweets in the dataset 
were written in slang language, which is a non-standard language that is commonly used in informal 
communication. This presented a significant challenge to the study, as slang words can be highly contextual 
and difficult to interpret, even for native speakers. Additionally, the dataset also included tweets written in 
Malaysian language, which is similar to Indonesian but has some notable differences. This made it more 
challenging to accurately classify the sentiment of these tweets. Despite these constraints, the study was able 
to provide valuable insights into the sentiment of the Indonesian public towards COVID-19 vaccine, which 
can be useful for policymakers and public health officials in their efforts to encourage vaccine uptake. 


6. CONCLUSION 

In this study, we analyzed public sentiment toward COVID-19 vaccinesin Indonesia. The analysis 
unveiled a diverse spectrum of sentiments within the tweets. A substantial portion reflected positivity, while 
another segment conveyed negativity, and a significant proportion maintained a neutral standpoint. The 
sentiment scores varied between different vaccines, with Sinopharm being the most preferred and 
AstraZeneca being the least preferred. The sentiment toward each vaccine also changed over time, with 
various topics influencing the sentiment scores. Additionally, we conducted a geographical analysis and 
discovered that public sentiment differed between provinces. The top-three provinces producing the most 
tweets are Jakarta, West Java, and East Java. Three provinces with the highest sentiment scores are Maluku, 
West Sumatera, and West Sulawesi. Finally, the correlation analysis revealed significant positive correlations 
between various pairs of variables, but no significant correlation was found between sentiment scores and 
new cases or deaths. 

Our findings answer all four research questions. The results suggest that public sentiment toward 
COVID-19 vaccines in Indonesia tends to be neutral or positive, but varies between vaccines and provinces. 
The study highlights the importance of monitoring public sentiment and identifying the factors that influence 
it, as this information can inform vaccine distribution and communication strategies. Future research will 
involve applying a pre-trained language model to the dataset, aiming to enhance performance rather than 
relying on lexicon-based methods. 
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